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AUTOREGRESSIVE MODEL LEARNING DEVICE FOR TIME-SERIES 
DATA AND A DEVICE TO DETECT OUTLIER AND CHANGE POINT 

USING THE SAME 

5 BACKGRQ U NPS QF TBE INVgNTJQN 

1, Field Qf thQ Invention 

The present invention relates to an 
autoregressive model learning device for time-series 
data and a device to detect outlier and change point 

10 using the same and particularly relates to a detection 

device associated with data analysis and data mining 
technologies that calculates the outlier score and the 
change point score for the data described with the 
discrete variate and/or continuous variate sequentially 

15 inputs so as to detect the outlier and the change point 

with a high accuracy. 

2, Desc ription o f the Related Art 
Conventionally, this type of detection device 

that calculates the outlier score and the change point 
20 score of the time-series data for detection of the 

outlier and the change point uses the technologies 
treated in the fields of statistics, machine learning, 
data mining and others. In other words, abnormal value 
detection and change point detection, which are the 
25 functions to be realized by the present invention, have 

been conventionally addressed by the fields of 
statistics, machine learning, data mining and so on. 
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The present invention, however, is applied to the 
situation where the stationarity is not assumed for the 
data generation source or the information source. 

Literature on the outlier detection in such a 
5 case includes the materials as shown below: 

One example is a method by P. Surge and J. Shawe- 
Taylor called "Detecting cellular fraud using adaptive 
prototypes" (Proceedings of AI Approaches to Fraud 
Detection and Risk Management,pp: 9-13 , 1997 ) • 
10 Another example is a method by K. Yamanishi 

titles "On-line Unsupervised Outlier Detection Using 
Finite Mixtures with Discounting Learning Algorithms 
(Proc. of the Sixth ACM SIGKDD International Conference 
on Knowledge Discovery and Data Mining, ACM 
15 Press, pp:320-324, 2000) . 

Still another example is a method by U. Murad and 
G. Pinkas called "Unsupervised profiling for identifying 
superimposed fraud" (Proceedings of 3rd European 
Conference on Principles and Practice of Knowledge 
20 Discovery in Databases, pp: 251-261, 1999) . 

These materials use the adaptive outlier 
detection algorithm to handle the non-stationarity. 

Further, according to a known ordinary method to 
detect the change point in statistics, the number of 
25 change points in the given data is decided in advance 

and a model is applied considering that the data among 
change points can be described by a stationary model. 



Such a method is described^ for example^ in the 
following literature. 

An example is a paper by B* Guthery titled 
"Partition regression" in Journal of American 
Statistical AssociationJ ( 69 : 945 — 947,1974) or a paper by 
M. Huskova " Nonpar ame trie procedures for detecting a 
change in simple linear regression models" in the book 
titled "Applied Change Point Problems in Statistics" 
(Nova Science Publishers, Inc, 1995) . 

For detection of the change point in data mining, 
a method by V. Guralnik and J. Srivastava is described 
in "Event detection from time series data" (Proc.of the 
ACM SIGKDD International Conference on Knowledge 
Discovery and Data Mining, ACM Press, pp:32-42, 1999) . 

The conventional methods and devices according to 
the above literature have drawbacks as follows as a 
device to detect outliers and change points from the 
time-series data. 

In the outlier detection method that can be 
sequentially processed by the conventional machine 
learning technology such as the method by P. Burge and J- 
Shawe-Taylor, the method by K. Yamanishi et al., or the 
method by U. Murad and G. Pinkas as described above, any 
statistic model suitable for time-series data is not 
used. Therefore, there is a drawback that the 
characteristics of the data having time-series nature 
cannot be grasped sufficiently. The statistic model 



suitable for time-series data here means a model that 
can express correlation among data at different timings. 
For example, the autoregressive model and Markov model 
are such type of models • 

In addition, the conventional change point 
detection method described in the paper by V. Guralnik 
and J. Srivastava basically uses collective processing 
of data or so-called batch processing and cannot process 
the data sequentially. Further, the conventional change 
point detection methods as described above are designed 
on the assumption that the data are locally stationary, 
but such assumption is not appropriate in the reality 
and should be removed. 

Further, though it is preferable to handle the 
outliers and the change points together and detect each 
of them in application of data mining or the like, 
schemes to handle them together only has been known so 
far. 

SUMMARY OF THE INVENTION 

An object of the present invention is to solve 
the above problems. More specifically, it is an object 
of the present invention to use a statistic model that 
can grasp the nature of the time-series data, to support 
non-stationary data, to handle the outliers and the 
change points together and to sequentially execute the 
processing. 



-5- 



According to the first aspect of the invention, 
an autoregressive model learning device that 
sequentially reads the data string of the real number 
vector values and learns the probability distribution 
5 for generation of the data string using the 

autoregressive model comprises 

a data updating device that updates the 
sufficient statistic of the autoregressive model with 
forgetting the past data using newly read data and a 

10 parameter calculator that reads the sufficient statistic 

updated by the data updating device and calculates the 
parameter of the autoregressive model using the 
sufficient statistic . 

According to the second aspect of the invention, 

15 an outlier and change point detection device that 

calculates the outlier score and the change point score 
for the data described with the sequentially input 
discrete variate and/or continuous variate so as to 
detect the outlier and the change point comprises 

20 a first model learning device that learns the 

generation mechanism for the read data series as the 
time-series statistic model specified by the finite 
number of parameters, and 

an outlier score calculator that reads the value 

25 of the parameters obtained through learning by the first 

model learning device, calculates the outlier score of 
the data based on the read parameter of the time-series 
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model and the input data and outputs the results . 

In the preferred construction, the outlier and 
change point detection device further comprises 

as a detection device to detect the change point , 

a moving average calculator that sequentially 
reads the outlier scores calculated by the outlier score 
calculator and calculates their moving average, 

a second model learning device that sequentially 
reads the moving average of the outlier scores 
calculated by the moving average calculator and learns 
the generation mechanism for the moving average series 
in the read score as a time-series statistic model 
specified by the finite number of parameters, and 

a change point score calculator that reads the 
parameter value obtained by learning by the second model 
learning device and calculates the outlier score for 
each moving average based on the read parameter of the 
time-series model and the moving average of the input 
outlier scores and outputs the outlier score for each 
moving average as the change point score of the original 
data. 

In another preferred construction, the first 
model learning device learns, in case the sequentially 
input data are described with continuous variate only, 
the probability distribution for generation of the data 
string with sequentially reading the data strings of the 
real number vector values using the autoregressive model 



and further comprises a data updating device to update 
the sufficient statistic of the autoregressive model 
with forgetting the past data using the newly read data 
and a parameter calculator to read the sufficient 
statistic updated by the data updating device and to 
calculate the parameter of the autoregressive model 
using the sufficient statistic. 

In another preferred construction, the outlier 
score calculator and the change point score calculator 
are considered as a single score calculator, further 
comprising as a device to determine the candidates of 
outliers and change points in the series for the data 
series described in discrete and/or continuous variates, 
a sort device to sort the data in descending order based 
on the outlier score and the change point score 
calculated by the score calculator and the display 
device that displays the data with higher scores 
according to the order sorted by the sort device as the 
candidates of outliers and change points. 

In another preferred construction, the outlier 
score calculator and the change point score calculator 
are considered as a single score calculator, further 
comprising, as a device to determine candidates of 
outliers and change points in the series for the data 
described in discrete and/or continuous variates 
sequentially input, a score judgement device that 
outputs the data over the predetermined threshold from 
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the outlier score and the change point score calculated 
by the score calculator as the candidates of outliers or 
change points . 

According to the third aspect of the invention, 
5 an autoregressive model learning method in which the 

data string of the real number vector values are 
sequentially read and the probability distribution for 
generation of the data string is learned using the 
autoregressive model, comprising the steps of 
10 a data updating step of updating the sufficient 

statistic of the autoregressive model with forgetting 
the past data using newly read data, and 

a parameter calculation step of reading the 
sufficient statistic updated by the data updating step 
15 and calculating the parameter of the autoregressive 

model using the sufficient statistic. 

According to another aspect of the invention, an 
outlier and change point detection method to detect the 
outlier and change point by calculating the outlier 
20 score and the change point score for the data described 

with the sequentially input discrete variate and/or 
continuous variate, comprising the steps of 

a learning step of learning the mechanism to 
generate the read data series as a time-series statistic 
25 model specified by the finite number of parameters, and 

an outlier score calculation step of reading the 
parameter value obtained through learning by the 



learning step and calculating the outlier score of each 
data based on the read parameter of the time-series 
model and the input data and outputting the results. 

In the preferred construction^ the method to 
detect the change point further comprises a moving 
average calculation step of sequentially reading the 
outlier score calculated by the outlier score 
calculation step and calculating the moving average^ a 
second learning step of sequentially reading the moving 
average of the outlier score calculated by the moving 
average calculation step and learning the generation 
mechanism for the moving average series in the read 
score as a time-series statistic model specified by the 
finite number of parameters, and a change point score 
calculation step of reading the parameter values 
obtained through learning by the second learning step, 
calculating the outlier score of each moving average 
based on the read parameter of the time-series model and 
the moving average of the input outlier scores and 
outputting the outlier score as the change point score 
of the original data. 

In another preferred construction, in case the 
sequentially input data are described with continuous 
variate only, the learning step sequentially reads the 
data string of the real number vector values and learns 
the probability distribution for generation of the data 
string using the autoregressive model, and updates the 
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sufficient statistic of the autoregressive model with 
forgetting the past data using newly read data, reads 
the updated sufficient statistic and calculates the 
parameter of the autoregressive model using the 
sufficient statistic . 

In another preferred construction, the outlier 
score calculation step and the change point score 
calculation step are considered as a single score 
calculation step and further comprises a step in which, 
as a method to determine candidates of outliers and 
change points in the series for the data series 
described with discrete and/or continuous variates, the 
data are sorted in descending order based on the 
calculated outlier score and the change point score and 
the higher score data are displayed as the outlier and 
change point candidates according to the order of 
sorting. 

In another preferred construction, the outlier 
score calculation step and the change point score 
calculation step are considered as a single score 
calculation step and further comprising a step in which, 
as a method to determine outlier and change point 
candidates in the series, the data over the 
predetermined threshold selected from the calculated 
outlier and change point scores as the candidates of 
outliers or change points for the data described with 
discrete variate sequentially input and/or continuous 
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variate . 

Thus^ according to the present invention^ the 
data are updated at the same time as forgetting of the 
past data. The present invention is suitable to process 
the time-series data and can improve the processing 
accuracy. 

Other objects, features and advantages of the 
present invention will become clear from the detailed 
description given herebelow. 

BRIEF DESCRIPTION OF THE DRAWINGS 
the present invention will be understood more 
fully from the detailed description given herebelow and 
from the accompanying drawings of the preferred 
embodiment of the invention, which, however, should not 
be taken to be limitative to the invention, but are for 
explanation and understanding only, 
in the drawings: 

Fig. 1 is a configuration diagram to show the 
configuration of a first embodiment of an AR model 
learning device according to the present invention; 

Fig. 2 is a flowchart to illustrate the operation 
of the first embodiment; 

Fig. 3 is a configuration diagram to show the 
configuration of second and third embodiments of a 
device to calculate the outlier score and the change 
point score according to the present invention; 
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Fig. 4 is a flowchart to illustrate the operation 
of the second embodiment; 

Fig, 5 is a flowchart to illustrate the operation 
of the third embodiment; 

Fig. 6 is a configuration diagram to show the 
configuration of a fourth embodiment of the device to 
determine outlier and change point candidates according 
to the present invention; 

Fig. 7 is a flowchart to illustrate the operation 
of the fourth embodiment; 

Fig. 8 is a configuration diagram to show the 
configuration of a fifth embodiment of the device to 
determine outlier and change point candidates according 
to the present invention different from Fig. 3; 

Fig. 9 is a flowchart to illustrate the operation 
of the fifth embodiment; and 

Fig. 10 is a graph to show an embodiment of 
experiment results on the change points using the score 
calculator shown in Fig. 2. 

DESCRIPTION OF THE PREFERRED EMBODIMENT 

The preferred embodiment of the present invention 
will be discussed hereinafter in detail with reference 
to the accompanying drawings. In the following 
description, numerous specific details are set forth in 
order to provide a thorough understanding of the present 
invention. It will be obvious , however, to those skilled 
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in the art that the present invention may be practiced 
without these specific details. In other instance^ well- 
known structures are not shown in detail in order to 
unnecessary obscure the present invention. 
5 Referring to the attached figures, preferred 

embodiments of the present invention are described below. 

First of all, the notation is explained, "x" 
represents the data of n-dimensional vector value having 
a real number as the component, "y" represents the data 

10 of m-dimensional vector value having a discrete value as 

the component, "x" and "y" are collectively expressed as 
"z = (X, y)". A series comprising N pieces of data is 
expressed as "Z^ = Z^, Z2, ... Zj," . 

A method to calculate the "outlier score" for 

15 such a series is described below. 

Firstly, consider a statistic model to generate 
the data z: P (Z^\ 6 ) = p (x^, yil^)- This represents the 
range where "z" moves or the probability density 
function defined on the range Z. 

20 " B " is a parameter to specify the probability 

density and generally consists of a discrete parameter 
and a continuous value parameter. As the probability 
density function of this type, the finite mixed Gaussian 
distribution or the autoregressive model (time-series 

25 model) are used if "z" comprises continuous variables, 

for example. In case of the time-series model, the 
probability density of the i-th data Z^ depends on the 
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series Z^"^ so far and the model becomes as follows : 

P (2i|2'"S 0) 

In general, for calculation of the outlier score, 
the value of the parameter d is assumed (or "learned") 
5 based on the data series. Here, the parameter is learned 

using the "Sequential learning method", in which the 
data series is sequentially read and at the same time 
the parameter is sequentially changed based on the read 
data. Suppose here that the parameter value obtained as 
10 a result of learning with reading the data to to be " 

^ The outlier score for "Z^+i" can be calculated 
using this. For example, the logarithm score S^ and 
Bellinger score sH can be calculated by the formulas 1 
and 2 below. 



15 



20 



Sl =-logp(Zi,i|e^^>) (formula 1) 

SH=clHpUz\e^*^),p('|z*-\e(^-^>)) (formula 2) 

where "d^" is the squared Bellinger distance between two 
probability densities and is defined by Formula 3 below. 



d^(p.q) = 5) /(Vp(X'y) - Vq(x.y) f dy 



(formula 3) 



Next, the AR model used in the present invention 
is described. The AR model is a time-series statistic 
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model to describe the probability distribution of the 
series of the n-dimensional real number vector data Xi. 
Firstly, the series "co^ = cUjCOj ...cOj," is introduced as 
an auxiliary probability variable. This is supposed to 
be in the same dimension as "x" (n-dimension) • Generally, 
the k-degree AR model can be expressed by Formula 4 
below. 



(formula 4) 



10 Note that (1=1, k) is an n-dimensional 

square matrix and £ is a probability variable according 
to normal distribution of covariance matrix Z with an 
average of " 0 " . 

Suppose now that can be given using u^ as "x^ - 

15 Ui + " . If Formula 5 below is given here, the 

probability density function of x^ can be given by 
Formula 6 below. 



t-i 
t-k 



(formula 5) 



p(xjx;:i:e) 

1 



(formula 6) 



however, i = ^Ai w..^ +|x,e =(A,,"-,Afc, ^^, 2) 



20 
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An outlier level calculator sequentially reads 
the data series from the beginning and, when it reads 
the i-th data z^, outputs its outlier level s^^^. 
5 Then, referring to Fig. 1, an autoregressive 

model learning device as described above is explained as 
a first embodiment. Suppose here that the constant r to 
express the speed of forgetting and the degree k of the 
AR model are given in advance. The constant r is a value 

10 from 0 to 1. Smaller constant means quicker forgetting 

of the past data. 

As shown in the figure, the first embodiment is a 
data updating device and comprises a forgetting type 
sufficient statistic calculator 11 to receive input 

15 and a parameter calculator 12 to receive the output of 

the same and to send the parameter value. 

The forgetting type sufficient statistic 
calculator 11 is a device to calculate the forgetting 
type sufficient statistic in the AR model. The 

20 forgetting type sufficient statistic is the sufficient 

statistic corrected so that the influence of older data 
becomes smaller. The sufficient statistic here means the 
n-dimensional vector IM and "k+1" pieces of n-dimensional 
square matrix (j = 0, 1, k). The forgetting type 

25 sufficient statistic calculator 11 has a function to 

store the past data at the timing k for the k-degree AR 
model . 
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The parameter calculator 12 calculates the value 
for parameter 0 "= (A^^ — , A^, fi , Z)" of the AR model 
based on the given sufficient statistic. 

Referring to the flowchart of Fig. 2, the 
operation in the first embodiment is described. Firstly, 
the parameters stored in the parameter calculator 12 are 
initialized before data reading. Then, every time the t- 
th data is input, the following steps are executed. 

The forgetting type sufficient statistic 
calculator 11 deletes the oldest data it has stored when 
data Xt is input (Step 201) and stores the newest data x^ 
instead to obtain the data string "x^, x^.i, 
(Step 202) . 

Using this, the forgetting type sufficient 
statistic calculator 11 updates the sufficient statistic 
" /X , Cj (j = 0, k)" it keeps by the update rules 

expressed by Formulas 7 and 8 shown below (Step 203) and 
sends the obtained sufficient statistic to the parameter 
calculator 12 (Step 204). 

(l-r)tx + rx, (formula 7) 

Cj:=(l-r)Cj +r(x, -^i)(x,,j -n)'^ (formula 8) 

The parameter calculator 12 determines the 
solution of the simultaneous equations for Formula 9 
below having "upper bar A^" (i=l, k) as the 
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unknown (Step 205). Note that "C.j= C^". 



k 



c. = yA,c.,,(j=i,...,k) 



(formula 9) 



however Aj(i = l, -',k) 



The parameter calculator 12 substitutes the 



determined solution for "A^" and calculates the 
parameter 6 using Formulas 10 and 11 below (Step 206). 



Then, it outputs the obtained parameter 0 "= ( A 
If A,,, fJL, Z)" (Step 207). 



Then, referring to Fig. 3, second and third 
embodiments are described below. 

As shown in the figure, these embodiments 
comprise a time series model learning devices 21 and 24 
corresponding to the first and second model learning 
devices as described above, a moving average calculator 
22 and a score calculator 23 containing both of the 
outlier score calculator and the change point score 
calculator described above. The second embodiment is 
realized by the time-series model learning device 21 and 



k 




(formula 10) 



2:=(l-r)2 + r(x,-z,J(x. -2,J^ 



(formula 11) 
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an outlier score calculator and the third embodiment is 
realized by the time-series model learning devices 21 
and 24 and the score calculator 23. 

The time-series model learning devices 21 and 24 
are devices to learn the parameter in the probability 
density function of the time series model with 
sequentially reading data. 

Note that, on one hand, the time-series model 
learning device 21 is a device to learn the probability 
density function related to the input data and the 
probability density function Fp used here is expressed by 
Formula 12 below. 

Fp =p(z, \z'-' (formula 12) 

On the other hand, the other time-series model 
learning device 24 is a device to learn the probability 
density function related to the moving average series of 
the score calculated by the moving average calculator 23 
and uses a k-degree AR model with a single variate. The 
probability density function F^,, is expressed by Formula 
13 below. 

Fqk =q(at l<^l:L '9) (formula 13) 

The score calculator 23 reads the parameters and 
data of the probability density functions Fp and Fq„ and 
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calculates the score for data Xf The score calculator 
23 has, in addition to its calculation function, a 
function to save the latest u (z) pieces of data for "Z^" 
series, the latest u (oc) pieces of data for "a^" series 
and the previous parameter for " 6 " and " ^ " • In case of 
the probability density function Fg,, using the k-degree 
AR model, for example, the logarithm score or the 
Hellinger score can be calculated under the condition 
"u( a ) = k" . 

The moving average calculator 22 is a device that 
calculates and outputs the T moving average of the real 
number data input sequentially. For this purpose, the 
moving average calculator 22 has a function to store T 
pieces of real numbers inside. 

The device related to the second embodiment works 
according to the order below. Referring to the flowchart 
of Fig. 4, the operation of the second embodiment is 
described below. 

The entire system is initialized first. Some 
predetermined values are set to the devices to store the 
parameters and data. The device shown in the figure 
works as follows every time the t-th data Zt"=(Xt, yt)" is 
input . 

The time-series model learning device 21 and the 
score calculator 23 receive the input of data z^ (Step 
401) . 

The score calculator 23 calculates the score for 
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data Zt as the outlier score s^ based on the parameter I 
of the probability density function Fp input and saved in 
the past, the input data and the past data "z^.i/ z,,. 
2, z^.," (Step 402). 

5 Then, the obtained outlier score s,. is sent to 

the moving average calculator 22 and at the same time 
output to outside (Step 403). 

The device related to the third embodiment works 
according to the order below following the second 
10 embodiment above. The operation of the third embodiment 

is described below with referring to the flowchart of 
Fig. 5. 

When the moving average calculator 22 receives 
the score s,, from the score calculator 23 (Step 501), it 

15 erases the oldest saved score and saves the newly input 

score s^ (Step 502). 

Then, the moving average calculator 22 calculates 
the average of T pieces of saved scores and sends it 
to the time-series model learning device 24 (Step 503). 

20 The time-series model learning device 24 works as 

explained in the first embodiment above and updates the 
parameter ^ of the probability density function Fq^ using 
k-degree AR model with a single variate (Step 504) and 
sends the obtained parameter 6 and the score oc ^ t^o the 

25 score calculator 23 (Step 505). 

The score calculator 23 calculates the score of 
the score or the change point score based on the 
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parameter 6 of the probability density function Fg 
expressed by Formula 14 below input in the past and 
saved, the input data the past data "oc^.i, a^^ 

2/ •••/ ^M-u' (Step 506) and outputs the obtained score 
5 (Step 507) • 

Fq =q(at >e) (formula 14) 

Then, referring to Fig. 6, a fourth embodiment is 
described below. 

10 This figure shows data 31, an outlier 

score/change point score calculator 32, which is the 
score calculator described above, a scored data 33, a 
sort device 34 and a display device 35. The data 31 is a 
database storing data series with a finite length. The 

15 outlier score/change point score calculator 32 is a 

device to calculate the outlier score and the change 
point score as described in the embodiment 2 or 3 above. 
The scored data 33 receives and stores the outputs from 
the outlier score/change point score calculator 32. The 

20 sort device 34 sorts the data in the descending order of 

score using the outlier score and the change point score. 

The devices shown in the figure work according to 
the order below. The operation of the fourth embodiment 
is described below with referring to the flowchart of 

25 Fig. 7. 

The outlier score/change point score calculator 
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32 accesses the data 31^ sequentially reads the data 
series and calculates the outlier score and the change 
point score for each data (Step 701) and then sends a 
three-element set of the data, the outlier score and the 
change point score to the scored data 33 (Step 702) • 
The scored data 33 stores the sent data (Step 

703) . 

The sort device 34 accesses the database of the 
scored data 33 and sorts the data stored there in the 
descending order of score using the outlier score and 
the change point score and send them to the display 
device 35 (Step 704). 

The display device 35 lists and displays two 
types of sorted data sent according to the sort order 
(Step 705) . 

Next, referring to Fig. 8/ a fifth embodiment is 
described. 

The figure shows data 41, an outlier score/change 
point score calculator 42, which is the score calculator 
as described above, scored data 43, a score judgement 
device 44 and a display device 45. The score judgement 
device 44 is provided in Fig. 4 instead of the sort 
device 34 in Fig. 3. 

The data 41 is a database storing data series 
with a finite length. The outlier score/change point 
score calculator 42 is a device to calculate the outlier 
score and the change point score as described in the 
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embodiments 2 or 3 above. The scored data 43 receives 
and stores the outputs from the outlier score/change 
point score calculator 42. The score judgement device 44 
accesses the database of the scored data 43 and sends 
the data over the predetermined threshold selected from 
the stored data using the outlier score and the change 
point score to the display device 45. 

The devices shown in the figure work according to 
the following order. The operation of a fifth embodiment 
is described below with referring to the flowchart of 
Fig. 9. 

The outlier score/change point score calculator 
42 accesses the database of the data 41 and, with 
reading the data series sequentially, calculates the 
outlier score and the change point score for each data 
(Step 901) . 

To the database of the scored data 43, a three- 
element set consisting of the data, the outlier score 
and the change point score is sent sequentially (Step 
902). 

The database of the block 43 stores the sent data 
(Step 903) . 

The score judgement device 44 accesses the 
database of the scored data 43 and sends the data over 
the predetermined threshold selected from the stored 
data using the outlier score and the change point score 
to the display device 45 (Step 904). 
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The display device 45 displays the two types of 
sent data as they are or lists them according to the 
sort order (Step 905). 

Next, referring to Fig, 10 , the actual data 
5 analyzed using the score calculator for the outlier 

score and the change point score described with 
referring to Fig. 2 are described. 

This experiment was conducted in order to find 
out the change point. This is an example in which the 

10 daily data of Tokyo Stock Price Index (TOPIX) (1946- 

1998) are analyzed and the results of the period from 
1985 to 1995 are shown. The graph shows the original 
data and the change point score attached to them. The 
data are pre-processed. In other words, if the original 

15 series is "One-dimensional" and is "x^", this is 

converted to "x^./ x^-x^.i". It is expected that such 
conversion helps detection of sharp change of the trend 
in addition to change of the average. According to this 
analysis result, it is understood that the change point 

20 score is high for so-called Black Monday and in the 

period of generation and collapse of the bubble economy. 
The graph shows a quite high peak on the day following 
the Black Monday. 

Though the above explanation refers to the 

25 functional blocks shown in the figures, the functions 

can be freely distributed by separation or unification 
as far as the above functions are satisfied. The above 
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explanation does not limit the present invention • 

As described above , the present invention has an 
effect that the extent of statistical outlier or change 
point appearing in the time-series data is measured and 
presented as the outlier score or the change point score 
and that their detection is enabled with a high accuracy • 
The reasons are as follows: 

First of all, the time-series model learning 
device that learns the generation mechanism of the read 
data series as the time-series statistic model is used 
for the data string input sequentially. 

In addition, the score calculator calculates the 
outlier score of each data based on the time-series 
model parameter and the input data. 

Further, the outlier and the change point are 
detected through calculation of the outlier score and 
the change point score by combining the moving average 
calculator to calculate the moving average of the 
outlier scores, the time-series model learning device to 
learn the mechanism for generation of moving average 
series as the time-series statistical model and a score 
calculator that further calculates the outlier score of 
the moving average based on the moving average of the 
outlier scores and outputs the result as the change 
point score of the original data. 

Although the invention has been illustrated and 
described with respect to exemplary embodiment thereof. 
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it should be understood by those skilled in the art that 
the foregoing and various other changes, omissions and 
additions may be made therein and thereto, without 
departing from the spirit and scope of the present 
5 invention. Therefore, the present invention should not 

be understood as limited to the specific embodiment set 
out above but to include all possible embodiments which 
can be embodies within a scope encompassed and 
equivalents thereof with respect to the feature set out 
10 in the appended claims. 



